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Abstract — This article considers the performance of digital 
communication systems transmitting messages over finite-state 
erasure channels with memory. Information bits are protected 
from channel erasures using error-correcting codes; successful 
receptions of codewords are acknowledged at the source through 
instantaneous feedback. The primary focus of this research is 
on delay-sensitive applications, codes with finite block lengths 
and, necessarily, non-vanishing probabilities of decoding failure. 
The contribution of this article is twofold. A methodology to 
compute the distribution of the time required to empty a buffer 
is introduced. Based on this distribution, the mean hitting 
time to an empty queue and delay-violation probabilities for 
specific thresholds can be computed explicitly. The proposed 
techniques apply to situations where the transmit buffer contains 
a predetermined number of information bits at the onset of the 
data transfer. Furthermore, as additional performance criteria, 
large deviation principles are obtained for the empirical mean 
service time and the average packet-transmission time associated 
with the communication process. This rigorous framework yields 
a pragmatic methodology to select code rate and block length for 
the communication unit as functions of the service requirements. 
Examples motivated by practical systems are provided to further 
illustrate the applicability of these techniques. 

I. Introduction 

Contemporary communication systems must be designed to 
accommodate the various applications that compose today's 
digital landscape. In particular, mobile devices must meet 
the heterogeneous needs of various data flows in terms of 
delay tolerance and bandwidth requirements. On the Internet 
backbone, congestion is often prevented by over-provisioning. 
The large throughput and low latency of parallel optical lines 
provide a pragmatic solution that offers adequate network 
performance. This approach, combined with localized content 
distribution networks and edge throttling, is key in supporting 
delay-sensitive traffic over the Internet core. Unfortunately 
a similar strategy cannot be applied to connect untethered 
devices, as wireless physical resources are limited and costly. 
The narrow usable spectrum and the broadcast nature of 
wireless environments limit the effective bandwidth of wireless 
access networks and, hence, demand the efficient management 
of available resources. 

In this article, we develop a mathematical framework that 
enables the optimal allocation of link resources for wireless 
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systems in the context of delay-sensitive communication. 
Distinguishing features of the proposed methodology include 
the joint treatment of finite-state channels with memory and 
queueing behavior at the transmitter. The focus is on the 
first-passage time to an empty queue, and the methodology 
implicitly provides a distribution for the time it would take an 
additional packet to reach the head of the queue. This view 
is not only important for resource allocation and performance 
evaluation, it offers a foundation for choosing among possible 
routes and distinct interfaces. From an abstract perspective, we 
introduce a formulation where time-dependencies in channel 
states and decoding failures are captured meticulously. In 
contrast to block-fading models, this formulation allows the 
seamless optimization of parameters such as code rate and 
block length. This is instrumental in better understanding 
how these parameters affect the overall performance of delay- 
sensitive wireless connections. 

Several contributions on the interplay between decisions 
at the physical layer and overall performance at the link 
layer can be found in the literature (TJ, 10, Q, fl4). Notable 
approaches include the outage capacity 0, J6), a probabilistic 
performance criterion based on the marginal distribution of 
channel blocks; the effective capacity Q, (8) which captures 
the decay rate in buffer occupancy at the transmitter; and 
finite block-length analyses of wireless connections J5], iflOll . 
Physical resources can be optimized to reduce average delay 
by carefully selecting advantageous modulation schemes and 
coding strategies ifTTl . 1121 . Multi-objective problem formu- 
lations have also been explored. For instance, the optimal 
tradeoff between power and delay has received attention in 
the past lfl3l . The joint treatment of queueing and error-control 
coding has been examined by simultaneously considering the 
effective capacity of a link and the error exponent of a code 
family JT4J, 03] ■ Markov models have been successfully 
employed in the queueing analysis of communication links 
with automatic repeat request fl6l . IfTTl . Finally, powerful 
asymptotic techniques based on large deviations and heavy 
traffic limits have been developed to handle real-time traffic 
over unreliable links QH, lfl9l . 

This study differs from previous contributions in that it 
relates queueing behavior, error control coding and channel 
evolution without resorting to asymptotically long coding 
delays or rough approximations. Decoding performance at the 
receiver captures channel correlation within a block, while 
the queueing aspect of the problem is key in understanding 
the impact of time-dependencies among successive decoding 
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attempts. Together, they provide an accurate assessment of 
overall system performance and lead to novel guidelines about 
efficient designs. 

Furthermore, by focusing on the first-passage time to an 
empty queue ll20ll . we are able to bypass the search for repre- 
sentative arrival processes. Rather, resource management can 
be performed adaptively based on current system conditions. 
Having a distribution for the hitting time to an empty buffer 
enables the computation of several pertinent performance 
criteria such as the probability of violating a completion 
deadline, the mean first-passage time to an empty queue, and 
Chernoff bounds. The proposed methodology is closely related 
to generating functions [21] and it works well for reasonably 
small initial buffer sizes, which are typical of communication 
systems subject to stringent delay restrictions. On the other 
hand, under large buffers, this technique becomes somewhat 
cumbersome. In this latter case, analyzing the large deviations 
governing the evolution of the system offers a promising 
new direction to derive meaningful guidelines for resource 
allocation and the selection of system parameters. Indeed, the 
concentration of empirical measures can be used to gracefully 
adjust delay-sensitivity to the needs of real-time data flows 
by selecting the deviation threshold, i.e., the argument of the 
rate function ll22l . Once a threshold is set, system parameters 
can be optimized according to this objective function and the 
resulting performance can be predicted accurately. 

Throughout, we assume the availability of reliable acknowl- 
edgements using periodic feedback. We also assume that 
the transmitter and receiver share a common randomness, 
which permits the utilization of random binary codes. The 
remainder of this article is organized as follows. Section [TT] 
presents the channel model and the random coding scheme. 
The queueing aspect of the problem is developed in Sectionlllll 
A large deviations perspective on the mean transmission time 
and the average service rate is offered in Section |IV] The 
findings are supplemented by a discussion of pertinent criteria 
for performance evaluation, along with numerical examples. 
Concluding remarks and possible avenues of future research 
are exposed in Section IVIII 



II. System Model 

One physical aspect of wireless communication that we 
are particularly interested in is channel memory. From a 
queueing perspective, it is well known that correlation over 
time can drastically alter the stationary distribution of a 
queueing system [23], [24]. In a similar manner, channel 
memory can have a strong impact on overall performance, 
as it induces time-dependencies in the service process at 
the transmitter. This phenomenon is especially important for 
delay-sensitive applications that require the reliable, ordered 
delivery of data streams. A prime model class in dealing with 
such dependencies is composed of finite-state channels with 
memory [25 1, [26|, ll27l . System models derived from this 
class of channels are typically mathematically tractable, and 
they offer a natural mechanism to account for correlation over 
time. Moreover, insights acquired by studying erasure channels 
can often be translated to error channels or, at least, provide 



partial intuition about promising solutions for the latter, more 
challenging scenario. 

This article focuses on a communication paradigm where 
information bits flow from a source to a destination. The 
transmitter is assumed to possess a message of a certain 
length at the onset of the data transfer, and forward error 
correction is employed to shield content from potential symbol 
erasures. At the beginning of a transmission, the leading 
information bits stored at the source are grouped into a 
segment, and redundancy is added to this message using 
block encoding. The resulting codeword is then sent over a 
finite-state erasure channel with memory. Contingent upon 
the channel realization, the destination can either retrieve the 
data contained in the transmitted codeword or it declares a 
decoding failure. Successful transmissions are acknowledged 
and the corresponding bits are then discarded from the source 
buffer. Otherwise, the leading information bits remain in the 
queue. We emphasize that, in this framework, the original data 
sequence is guaranteed to be transferred unaltered. However, 
the completion time of the queue-emptying process is a 
random variable that depends on the coding/decoding strategy 
employed and on the realization of the channel. 

A. Channel Abstraction 

As indicated above, we capture channel stochasticity and its 
impact on the communication link using a finite-state Markov 
process. Several pertinent communication scenarios can be 
modeled in this manner l28l . ||29l , 11301 . At a particular time 
instant, we assume that the channel can be in one of k states 
taking value in C = {1, 2, . . . , k}. State transitions over time 
form a Markov chain. We denote the corresponding transition 
probability matrix by 

hi bu ■■■ bi k 

&21 &22 ■ ■ ' b 2 k 

B 
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Entry bij in matrix B represents the conditional probability 
that, starting from state i, the channel transitions to state j. 
As such, B is a right stochastic matrix. When in state i, 
the transmitted symbol is erased with probability £$ and, 
consequently, it is received correctly with probability 1 — £j. 
For notational convenience, we impose a quality ordering 
on the channel states, i.e., Si > Ej whenever i < j. We 
represent the state of the channel at time instant n by C n . 
We note that {C„} is a first-order Markov process. A diagram 
illustrating the operation of the communication link for a two- 
state channel appears in Fig. [T] 

Assumption 1: Throughout, we hypothesize that the chain 
governing the finite-state channel is irreducible and aperiodic. 
We also assume that this Markov channel is non-trivial in that 
there exists a state i 6 C such that E{ < 1. 

As we shall see, these conditions guarantee the existence of 
a random coding scheme for which the transmission process 
terminates in finite time, almost surely. These transmission 
schemes are the only ones of interest for our purpose. In that 
sense, Assumption Q] is introduced to prevent difficulties that 
arise from idiosyncratic, irrelevant scenarios. 
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functions. Define matrix B a by 



Fig. 1. Communication at the bit level takes place over a finite-state erasure 
channel with memory. While in state i, the probability of a bit erasure is £j, 
The evolution of the channel over time forms a Markov process. 



B. Coding Scheme 

The envisioned system employs forward error correction to 
counteract possible channel erasures. A codeword transmission 
attempt is initiated by selecting the leading K bits from the 
source buffer. Redundancy is then added to this data segment 
through the encoding process. A random coding scheme is 
adopted as a mathematically convenient abstraction to realistic 
implementations (TJ, (3T). To create each codeword transmis- 
sion, a random binary parity check matrix of size (N—K) x N 
is generated. Every entry is selected uniformly over the binary 
alphabet, independently from other elements. The resulting 
codebook corresponds to the nullspace of this matrix. We 
assume that maximum-likelihood decoding is performed at the 
receiver. 

We emphasize that this mode of operation requires shared 
randomness at the source and the destination. Interestingly, 
this coding scheme is known to perform well for large block 
lengths; and it supports flexible rates of communication, any 
rate of the form K/N where < K < N is admissible. 
These random codes have the additional property that the 
average probability of decoding failure depends only on the 
number of erasures caused by the channel and not on the 
specific locations of these erasures. Provided that e erasures 
have occurred during transmission, the probability of decoding 
failure can be evaluated explicitly, 



Pi(N-K,e) = 1- Y[ (l~2 



2=0 



(1) 



A proof for this statement is based on the equivalence between 
the linear independence of the e erased columns in the parity 
check matrix and the event of a successful decoding OP . 



C. Distribution of Erasures 

From the discussion above, we gather that the number 
of erasures suffered by a codeword plays a critical role in 
determining overall system performance, as it dictates the 
probability of decoding failure. This random variable thus 
warrants due attention. Let E denote the number of erasures 
occurring in a given packet transmission. We can describe 
its distribution in a compact form using matrix generating 



B, 
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&ifc(l - £\ + eix) 
6fefe(l -£fc + e k x) 



Furthermore, let \x n \ denote the linear operator that maps a 
polynomial in di[x) to the coefficient of x n . For e <E No and 
i,j £ C, one can show that ETTl 



Pv(E = e, C N+1 =j\C 1 =i) = fx e i [B 



(2) 



where, in this case, E denotes the number of erasures over 
an interval of length N . The probability that Markov process 
{C n } coincides with a specific sequence of states is equal to 
the probability of a certain path through the matching trellis. 
Moreover, at each point in time, the probability of observing an 
erasure only depends on the current state. Consequently, taking 
the Ath power of matrix T5 X is an efficient way to compute 
the aggregate conditional probability of observing exactly e 
erasures, given an initial probability distribution and an end 
state. In other words, B^ offers a way to simultaneously sum 
all the relevant paths through the trellis. It is also possible to 
compute such probabilities through nested summations 1321 . 
but the ensuing equations rapidly become cumbersome for 
large values of N and Markov chains with sizable state spaces. 

Given initial state i and for a fixed final state j, we can 
apply the total probability theorem to compute the probability 
of decoding failure, 

N 

Pf(N - K, e) Pr (E = e, C N+l = ] \C 1 =i). (3) 

e=0 

These conditional probabilities, along with the progression 
of the channel states, underlie the evolution of the queueing 
system. 

Remark 1: As a side note, it is instructive to point out that, 
under Assumption Q] there exist values for N and K such that 
the probability of decoding success as a function of C\ is not 
uniformly zero. In particular, if i is a channel state such that 
Ei < 1, then for large enough N and N — K, the probability 
of decoding failure in (0 will be less than one. Random codes 
for which the conditional probability of decoding success is 
not uniformly zero are termed non-trivial. 

III. Queueing Model 

This section describes the queueing behavior of our system. 
First, we assume that the number of information bits present 
at the source at the beginning of the communication process 
is fixed and equal to I. Given a code rate and block length, 
the source takes the leading K data bits and encodes the 
resulting segment into a codeword of length N using the 
scheme described in the preceding section. This codeword 
is then sent to the destination through N consecutive uses 
of the erasure channel. A service opportunity occurs every 
time the random code and channel realization jointly permit 
reliable decoding. We emphasize, again, that the destination is 
assumed to possess the ability to acknowledge the successful 
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reception of codewords through instantaneous feedback. As 
such, the selected information bits remain in the transmit 
queue until a corresponding codeword is decoded faithfully 
at the destination. This data segment is immediately discarded 
from the buffer upon successful decoding of a packet. 

In its simplest form, this scheme represents a variation 
of automatic repeat request (ARQ). We note that this mode 
of operation is somewhat naive in that the information con- 
tained in failed decoding attempts is disregarded. A more 
astute implementation will seek to leverage past failures by 
performing joint decoding over all the observed messages 
pertaining to the current data segment. Incremental redundancy 
and hybrid automatic repeat request are valuable techniques 
that can improve performance [33], [34), [351 . In this article, 
we discuss both ARQ and its hybrid variant, where partial 
information from failed transmission attempts is incorporated 
in the decoding process. Still, we focus largely on the rudi- 
mentary scheme because it admits a simpler, more elegant 
characterization while preserving the natural tradeoff between 
error protection and payload content. Overall, the proposed 
methodology yields pertinent results that help improve our 
understanding of delay-sensitive systems. 

Our primary interest lies in the distribution of the time 
elapsed until the message originally contained in the source 
buffer becomes wholly available at the destination. To capture 
this quantity adequately, we need to examine the evolution of 
the queue. The length of the queue can be expressed in terms 
of the number of data segments awaiting transmission. If a 
queue initially contains I information bits, then it will require 
the successful transmission of m = \l/K~\ codewords before 
the last segment gets processed. The number of segments in 
the transmit buffer therefore becomes a measure of residual 
work until our objective is met, and it is intrinsically linked 
to the state of our communication system. 

For N fixed, we can denote the size of the queue at the 
onset of codeword s by Q s . We note that the state of the bit- 
erasure channel at that same time instant is C s jv+i. The rapid 
succession of symbols in the bit-erasure channel compared to 
events taking place in the queue produces the mismatch in 
indexing between Q s and C s n+i- Indeed, queue transitions 
are only possible at the completions of decoding attempts, 
which only occur after every N symbol transmissions. The 
resulting stochastic process {Q s } is a hidden Markov process, 
as it is determined partly by the evolution of the unobserved 
channel process {C„}. While {Q s } alone does not possess the 
Markov property, it is possible to create an augmented process 
containing Q s with this desirable attribute. The particulars 
of the procedure depend on whether one is considering the 
standard ARQ framework or its hybrid variant. We thus treat 
these two instances separately. 

A. Automatic Repeat Request 

As the title suggests, this section focuses exclusively on the 
scenario where the source and the destination employ ARQ 
to overcome channel erasures and, thereby, achieve reliable 
data transmission. In particular, the information contained in 
past decoding attempts is disregarded by the decoder when 



receiving the latest codeword. To build a suitable model, we 
consider the random vector U s = (C s n+i,Qs) composed 
of channel state and queue length. We wish to show that 
this vector contains all the relevant information to track the 
evolution of the system. 

Theorem 1: The aggregate process {U s } s >q possesses the 
Markov property. That is, conditional on Xl t — (i,q), the 
stochastic process {U s +t}s>o is independent of Uq, ■ ■ ■ , Ut-\- 
Proof: See Appendix. ■ 

Using the total probability theorem, we can write the 
transition probabilities of {U s } as follows, 



Pr(I7 s+1 = (j,q s +i) | U. = (i,q s )) 



iV 



= 2J Pr (Qs+i = q s +i | E = e,Q s = q„) 



(4) 



e=0 



Pr (E = e, C (s+ i)at + i = j | C sJV +i = i) 

where i,j £ C. For a non-empty queue, the first part of each 
summand corresponds to one of three possible cases, 

Pr {Q s+ i = q s +i | E = e,Q s = q„) 

'P { (N-K,e), q s+ i = q s 
1-P { (N-K,e), q s+1 = q s -l 
0, otherwise. 

The probability of decoding failure Pf (-, -) appears in (Q3, 
while the conditional distribution of erasures within a block 
is given in ©. Thus, we have already developed the tools 
necessary to efficiently compute the value of every transition 
probability in ©. The evolution of the queueing system and 
its admissible transitions are depicted graphically in Fig. [2] 
The states {(•,?)} are collectively referred to as the gth level 
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Fig. 2. This figure illustrates the progression of the queueing system for 
a service process that is governed by a two-state Markov erasure channel. 
System states, which are composed of queue lengths and channel states, are 
represented by circles. Admissible transitions are marked by the arrows. 



of the queue. The first-passage time to an empty buffer is 
therefore equivalent to the hitting time to level zero. Due to 
the repetitive structure of this augmented system, the hitting 
time to a lower level will play a key role in finding a tractable 
solution to the problem at hand. 

An additional quantity of interest in the analysis of delay- 
sensitive systems is the mean service rate. To compute this 
quantity, it is convenient to analyze the service process {D s }, 
where D s indicates the potential of a successful decoding 
event at time s. That is, D s = 1 when a message can (or 
could) be decoded faithfully at the destination; and D s = 
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(5) 



otherwise. In words, the sequence {D s } indicates time instants 
at which blocks of information can be transferred successfully 
to the destination. As in the case of the queueing abstraction, 
the stochastic process {D s } forms a hidden Markov process 
which can be lifted to an augmented Markov process. Let 
V s — (CV s+ i) jy+i , D s j denote a random vector composed of 
the state of the erasure channel at the onset of block s + 1, 
together with the indicator of a service opportunity during 
block s. As in Theorem Q] one can show that the stochastic 
process {V s } forms a Markov chain. 

We note that the transition probabilities of {D s } are closely 
related to those of {Q s }. Since there are no arrivals in our 
framework, the evolution of these processes are governed by 

Qs+i = {Qs-D s ) + . 

For convenience, we establish a succinct notation for the 
transition probabilities of our two augmented processes, 

Kij = Pr(£/ s+ i = (j,q) | U s = (i,q)) 

= Pr(y s+1 = (j,o)|^ = M)) 

H ij =PT{U a+1 = (j,q-l)\U a = (i,q)) 

= Pr(y s+1 = (j,l)|K = (M)) 

where q G N, i,j G C and d G {0,1}. These common 
definitions draw further attention to the close connection 
between {U s } and {V s }. 

In view of Remark Q] and for non-trivial codes, there exists 
i e C such that /Xy > 0. This implies that the states associated 
with an empty buffer form the only closed communicating 
class and, as such, the remaining states are transient ll20l . Since 
the number of states in the augmented chain is finite, this 
structure ensures that the task of emptying the transmit buffer 
is carried out in finite time, almost surely. 

The symmetric decomposition of the queueing system into 
levels suggests an approach based on the quasi-birth-death 
structure of the chain. Suppose that the buffer contains exactly 
m data segments at time zero, i.e., Qo — m. We can define 
the hitting time from level m to level q of the chain as 

H q = inf{s >0\Q s =q}, (6) 

where < q < m. That is, H q designates the time instant at 
which the process {U s } first enters the qth level of the queue. 
We emphasize that, under the mild assumptions discussed 
above, H q is almost surely finite. For consistency, we also 
define H m — 0. Noting that Q s is a non-increasing process, 
we can write the sojourn time at level q as 



H„-l — H„ 



where < q < m. That is, random variable T q denotes the 
amount of time {U s } stays at level q before leaving for the 
subsequent lower level. 

We are especially interested in Ho, the first-passage time 
to an empty queue. Taking advantage of the structure of the 
augmented Markov chain, we can fragment Ho into a sum of 
elementary components. Specifically, the hitting time Ho is 
equal to the sum of the sojourn times T%, . . . , T m , i.e., 



9=1 




Fig. 3. This reduced Markov diagram represents one of the quasi-birth- 
death subcomponent of the queueing system. Starting from any distribution 
over these four states, it is possible to characterize the sojourn time T spent 
at level one. This is a key step in deriving the first-passage time to an empty 
buffer. 



We note that these sojourn times are not independent, but they 
are conditionally independent given the states of the channel 
at times {C]yH q +i} _ - Consequently, a powerful means to 
compute the distribution of Ho is to employ matrix generating 
functions, as described below. Consider a reduced Markov 
chain composed of states {(•, 0), (•, 1)}, as shown in Fig.[3]for 
a Gilbert-Elliott channel. This Markov diagram can be used 
to study sojourn time T, the time spent at a specific level of 
the chain. 

Under proper state ordering, we can write the transition 
probability matrix for the reduced subsystem as 



I 

K M 

where we have implicitly defined matrices 



(7) 



K = 



Kll 
K21 

Kfel 



Klk 
K fefe 



M = 



^21 
Mfcl 



Mlfe 
Hkk 



We emphasize that P is a stochastic matrix. As a consequence 
of the Perron-Frobenius theorem, we know that the spectral 
radius associated with P is one ll36l . 

To handle conditional independence among the sojourn 
times {Tq}, we again utilize the concept of generating func- 
tions extended to matrices 0211 . This more intricate version is 
necessary to keep track of the channel state entered after each 
downward queue transition. Define matrix generating function 
Gt(z) entry wise by 



where 1 



{■} 



[GrWljj = E [ 
is the standard set indicator function 



ij - l Z ~ 1 {Cnt+i=j} \ Cx = i\ 



(8) 



Lemma 1: For the reduced subsystem associated with (|7), 
the matrix generating function Gt(z) is equal to 



G T {z) = (I-Kz^IVb 



(9) 



Proof: The matrix generating function Gt{z) can be 
obtained by treating the entries of P as real polynomials in z, 
with 

P = [ 1 ° 

Mz Kz 
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Consider the two states (i, 1) and (J, I). Their indices in the 
ordering associated with P are k + i and Ik+j, respectively. 
As before, [z'J denotes the operator that maps a polynomial 
in z to the coefficient of z 1 . Suppose that, at time zero, the 
reduced system starts in state (i, 1). By construction, the joint 
probability that this system is in state (j, /) at time s > and 
has spent exactly t steps at queue-level one can be expressed 
as 

Pr(S s =t,U s = (j,l) | U = (i,l)) 

= [**] KWu fc+J . 

where S s represents the total time spent at level 1 over the 
interval from zero to instant s. The generating matrix Gt(z) 
can be obtained by taking the limit 

I 





GtO) = lim [0 I] P* 



[0 I] 



(I 



I 

(I-Kz)- 
Kz)~ 1 Mz. 



Mz 





I 








The above equation holds for all \z\ < g(K) _1 , where g(-) 
denotes the spectral radius of its matrix argument. The relation 
between Gt{z) and Gt{z) follows immediately from (O and 
the fact that G T (z) = E [z T ] . ■ 

B. Hybrid Automatic Repeat Request 

Hybrid ARQ is a mechanism that seeks to incorporate the 
partial information contained in failed transmissions into the 
subsequent decoding attempts of a same data segment. In 
this sense, it differs significantly from ARQ only when the 
initial decoding of a data segment fails. For finite-state erasure 
channels with memory, the evolution of a hybrid ARQ system 
can be characterized completely, although in a somewhat 
cumbersome manner. To implement hybrid ARQ with random 
codes, we must modify our coding strategy slightly. 

Herein, we focus on hybrid schemes with finite depths. That 
is, the transmitter-receiver pair has a predetermined number 
of tries to successfully transmit a data segment. Our favored 
implementation relies on puncturing random codes. In a way 
analogous to our previous approach, we generate a codebook 
by creating a random binary parity check matrix of size 
(dN - K) x dN, where d is the depth of the hybrid ARQ 
scheme. Again, the entries are selected uniformly from the 
binary alphabet and the codebook is equal to the nullspace of 
this matrix. The hybrid ARQ scheme progresses as follows. 
First, an information segment is mapped to a codeword. 
During the initial transmission, the leading N symbols of this 
codeword are sent over the erasure channel. Upon completion 
of this phase, the destination tries to recover the original 
data segment. When decoding fails, the next N symbols are 
sent and the aggregate message is run through a maximum- 
likelihood decoder. This process continues, communicating N 
symbols at a time, until the message is successfully decoded 
at the destination or the total number of attempts reaches its 
limit. 

Since untransmitted symbols can be classified as erasures 
for the purpose of decoding, we can leverage (Q3 in assessing 



the probabilities of decoding failure at the destination. That 
is, when s codeword chunks are present at the destination, out 
of which a total of e symbols are erased, the probability of 
decoding failure can be written as 



P f (dN-K,e + (d-s)N) 

e+(d-s)N-l 

= i - n (i - r-^-v) 



(10) 



Comparing this expression for s = 1 and d > 1 to (Q]), we 
gather that the probability of decoding failure after receiving 
one chunk of length N for the hybrid ARQ scheme differs 
from the probability of failure in standard ARQ. Indeed, there 
is a slight initial penalty resulting from using a random code 
tailored to hybrid ARQ. The following proposition establishes 
a uniform bound on the loss in performance associated with 
a hybrid scheme. 

Proposition 1: Suppose that p and e are fixed, positive 
integers. The function of n defined by 

P f (p + n,e + n) = { lll =° V - f ~ 

II if e > p 

is monotone increasing. Furthermore, the difference between 
this function and Pf(p,e) is uniformly bounded, 

P { {p + n,e + n)- P f (p, e) < 2~ p '. 

Proof: See Appendix. ■ 
As an immediate consequence of this proposition, we know 
that the penalty incurred in using hybrid ARQ, in terms of 
decoding failure at the first attempt, remains very small for typ- 
ical scenarios. This brings credibility to employing a punctured 
random code in our analysis. Second, the queue occupancy of 
a hybrid ARQ system with depth greater than or equal to d can 
be lower bounded by assuming that the decoding of a message 
always succeeds on the dth attempt, we call this the optimistic 
system. Similarly, the queue occupancy for a system with 
depth d can be upper bounded by assuming that, whenever the 
decoding fails at the <ith attempt, previously received symbols 
are discarded altogether and the process starts anew, we call 
this the pessimistic system. These strategies jointly produce a 
near complete characterization of the behavior of hybrid ARQ 
systems. We turn to the specifics of the proposed approaches 
below. 

Using random codes over erasure channels leads to some 
highly desirable properties for the hybrid ARQ problem. These 
properties are, in turn, instrumental in finding expressions for 
the probabilities of success at intermediate decoding attempts. 
Suppose that a codebook is generated using a (dN — K) x dN 
parity check matrix. For this specific code, if decoding fails 
given the first sN received symbols (including erasures), then 
it will necessarily be impossible to decode the message using 
the leading (s — 1)N received symbols. This nesting is in stark 
contrast to error channels. 

We employ Ps and P^ to denote the conditional 

probability of decoding success and decoding failure at attempt 
s with final state j and given initial state i. The conditional 
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probabilities of failure are equal to 

sN 



P f (s) (j|i) =^\p f (djV - K,e + (d- s)N)) 



e=0 



Pr (E t 



N 



e, C sN+ i = j | Ci = i) . 



Above, E s m represents the number of erasures over the dis- 
crete interval [1, sN]. Given the probabilities of failure events, 
the conditional probabilities of success can be evaluated in a 
recursive fashion. The probability of a success at time one 
with final state j, given initial state i, can be written as 

PP(j\i) = Pr(C N+1 = j | C x = i) - P f (1) (i|i). 

We note that this equation is the complement of (0, with a 
convenient new notation and appropriate parameters. Similarly, 
the conditional probability of being able to decode for the first 
time at attempt two with final state j and under initial state i, 
after averaging over the intermediate state I, is 

PP(M = Pr(C 2N+1 =j\C 1 =i)- P { {2) (j\ t ) 

-^P«(i|i)Pr(CW+i= j | C N+1 =l). 

Extending this procedure, we can compute the probability of 
a decoding success at attempt s with final state j, given initial 
state i, 

pWtfli) = Pv(C sN+1 = j I C x = i) - P f (s) (jW 

s-1 

-EE p i r) W) Pr = j i = o ■ 

r=l tec 

This methodology thus provides a recursive and efficient 
way to compute the probabilities that, under hybrid ARQ, a 
system takes exactly s coded chunks to decode the original 
message. As in Section ITH-AI one can also compute the matrix 
generating function of T, the time spent in the first level of 
the reduced Markov chain. 

As mentioned above, an optimistic bound (upper bound) on 
performance can be derived using 

P s W(j\i) = P T (C dN+1 =j\Ci = i) 

d-l 

-EE ^ (r) W) Pr = j i = o . 

r=l lec 

instead of Pg This bound holds irrespective of how the 

system handles failures at attempt d. We define the optimistic 
matrix generating function Gf{z) — G min {T d}( z ) entrywise 
by 



The pessimistic matrix generating function Gf(z) can be 
derived in two steps. First, consider the matrix generating 
function 



Then, under the assumption that information is discarded when 
the d decoding attempts have failed, we get 



oo . 
t=0 



I~z a P 



dr>{d) 



G f (z). 



Above, the matrix is defined entrywise as 



>(d) 



p t {d) m 



Note that upon failure to decode a full codeword, the system 
must theoretically generate a new codebook. In essence, T and 
T are Markov times that provide lower and upper bounds on 
T, the true stopping time of the hybrid ARQ decoding process. 
The corresponding matrix generating functions will be helpful 
in characterizing the hitting time to an empty buffer for hybrid 
ARQ systems; this will become manifest shortly. 



C. Hitting Time to an Empty Buffer 

We can build upon the matrix generating function of T 
to obtain the distribution of Hq. The basic insights behind 
this characterization are that the sojourn time at any level is 
finite almost surely and generating matrices can account for 
conditional independence. 

Theorem 2: The ordinary generating function of Hq, the 
first-passage time to an empty queue, is given by 



G Ho (z) = tt (G T (z)Y 



(11) 



where ttq is the channel state probability vector at time zero. 

Proof: This expression for Gh (z) can be obtained from 
an application of mathematical induction, which proceeds 
backward in time. The first step consists in showing that the 
hypothesis holds for the base case, the sojourn time at level m, 



k 

koGrJz)^ = Y, [GT m (z)}^ Pr(d = i) 

i=l 



k 

E 

»=i 



E[z T ™l {CNTm+1=j} \C 1 =i] Pr(d 



- E [ zTml {Civr m + i=j}] - E 

where we have used the fact that H m -i 
gather that 



{C'jVH m _ 1 + l— j} 

= T m . Thus, we 



[z 4 ] [7r Gr m = Pr (#m-i - t, C NHm _ 1+1 = j) . 

We continue with the inductive step in a similar manner. 
Suppose that the hypothesis is true for a certain integer q where 
< q < m; that is, 



[Gx(z)W = E P s (r) W> 



[7T G Hg (z)].=E 



z H "l 



{CNH„ + l—j} 



r=l 



[ir G Tm (z)---G Tq+1 (z)], 
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Then, we can write 

E Iz""- 1 ! 



= E 



\z H ^ HcNHqi+1=]} 

k 

y- i-{c NHq _ l+ i=3\ 



Z H « +T n 



{Cnh„ 



1 + 1 



=3} 



i=l 

Pi(C NHq +i = i) 



C 



NH a + l - % 



= E E 



E \z T n, 



C 



NH„ 



k 

E 

i=l 



[z Hq l{c NHq +i^} 

{CNH q _ 1 +l=j} 

[ 7 r G Tm (z)..-G T , +1 (z)]jG T ,(z)] jj 

= [n G Tm (z)---G Tq (z)} 3 = [ttoG^^z^.. 

That is, the hypothesis is also true for q — 1. We note that the 
third equality follows from the conditional independence of 
our quasi-birth-death Markov process. In our problem, we have 
Gy (z) — Gt(z) for all q G {1, . . . , m}. Since this expres- 
sion holds for any ttq, we conclude that Gjj d (z) = (GT(z)) m 
and, as a consequence, 



[«*] [ttq (G t WH, = Pr (ff = *, C NHo+1 



J) 



Summing over all the possible end states, we recover the 
expression for Gh (z) given in ( fTTT i. ■ 
To differentiate among possible initial conditions, it will 
become useful to write the first-passage time to an empty 
queue with an initial buffer size of m segment as Hq. 

IV. Large Deviation Analysis 

As seen in the previous section, it is possible to evaluate 
the exact distribution of Hq . This facilitates the selection 
of parameters to optimize overall performance. However, this 
process becomes cumbersome for large buffer sizes. In such 
circumstances, analyzing the large deviations governing the 
system offers a new direction to derive meaningful guidelines 
for resource allocation and parameter tuning. Below, we study 
two types of aberrations under the ARQ scheme: deviations 
in the average transmission time and the mean service rate. 
We note that, although large deviations can be studied under 
hybrid ARQ, this latter scenario is somewhat tedious and 
it offers limited additional insights. Hence we restrict our 
attention to the ARQ scheme. We begin with the average 
transmission time; that is, the normalized first-passage time 
to an empty queue. 



A. Normalized First-Passage Time 

Again, suppose that the transmit buffer contains exactly m 
segments at the onset of the communication process. We are 
interested in the large deviations associated with the sequence 
of random variables specified by 



Y m = —H, 



T (m) 



™ E T « 



1,2, 



The logarithmic moment generating function for Y m is 

A m (A) = logE [e Ay -] = logE 

= logG%> (eV» 

The existence of limits of properly scaled logarithmic moment 
generating functions suggests that {Y m } may satisfy a large 
deviation principle [22|. In particular, consider the following 
asymptotic regime 



A(A) = lim — A m (mA) " ' 

m— >oo jxi m— >oc 777, 



lim -\o R G^(e x ) 



lim llog(^ (G T ( e A )) m l 

•n^-oo 777, \ v \ / / 



(12) 



A few observations concerning A(A) are in order. We remark 
that, for any z = e A , 



G T (e A )= £K 



t e tx 



vt=0 



is a non-negative matrix over the extended real numbers. In 
fact, this matrix possesses additional properties which are 
summarized in the lemma below. 

Lemma 2: If T is finite almost surely, the matrix generator 
Gy(e A ) exists as a non-negative real matrix if and only if 
A < — logp(K). In particular, when A > — log g(K), one or 
more entries of GT(e A ) will be infinite. 

Proof: See Appendix. ■ 

Another important quantity is the spectral radius of K, 
which is related to the support of G*r(e A ) as seen in Lemma|2] 

Corollary 1: If T is finite almost surely, then £?(K) < 1. 
Proof: See Appendix. ■ 

Under Assumption[TJand for any non-trivial coding scheme, 
T is finite almost surely, thus the hypotheses of Lemma [2] 
and Corollary Q] are satisfied. A sufficient condition to ensure 
the existence of a large deviation principle for the average 
transmission time is that the Markov process {Ut} sampled at 
departure events {H q } is irreducible. This guarantees that the 
states of the corresponding jump chain form a unique recurrent 
class. Formally, we postulate the following condition. 

Assumption 2: The matrix (I — K) _1 M is irreducible. 

We note that, strictly speaking, this is not a necessary 
condition. Having a unique communicating class and, possibly, 
transient states in the jump chain will also work. However, this 
latter more encompassing setting leads to extra bookkeeping, 
which unnecessarily clouds some of the underlying concepts. 
Furthermore, all the practical systems we wish to study fulfill 
the requirements of Assumption As such, we take it for 
granted from this point forward. Under this assumption, the 
matrix Gy(e A ) is irreducible for any A < — logg(K) and, 
hence, the Perron-Frobenius theorem applies l36l . |22| . This 
leads to the following result. 

Proposition 2: Under Assumption [2] the limiting moment 
generating function defined in ( flZb exists as an extended real 
number for every A€l, with 



A(A) = 



e ((l-Ke 



Me' 



9=1 



oo 



A<-logg(K) 
otherwise. 
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Proof: See Appendix. ■ 
Using matrix norms, it can be shown that Gt (e A ) is 
differentiable entry wise over the interval A < — logg(K). 
Since A(A) is an isolated root of the characteristic function 
of matrix Gt (e A ), we deduce that it is positive, finite and 
differentiable with respect to A (see, e.g., 11371 Th. 11.5.1], ll22l 
p. 75]). Corollary [TJ asserts that f?(K) < 1, which implies that 
A(0) is finite. In view of the discussion above, we conclude 
that the origin is in the interior of {A G R : A(A) < oo}. 
Consequently, A(A) is essentially smooth and the Gartner-Ellis 
theorem applies [22], thereby establishing the desired result. 

Theorem 3: Suppose ^Y m = i Y^qLi Tq \ is the empirical 
mean sojourn time per level. For every x G M, consider the 
Fenchel-Legendre transform 

A*(z) =sup{Aa;-log£.(GT(e A ))}. (13) 

AeR 

The empirical mean Y m satisfies the large deviation principle 
with the convex, good rate function A*( ). That is, for any set 
rci and any initial state c G C, 

- inf A*(.t) < liminf — logPr(F m G T) 

< limsup — logPr(F m G T) < - inf A*(x). 

m^oo m xer 

Example 1: For the Gilbert-Elliott channel shown in Fig. [TJ 
it is possible to obtain a closed-form expression for the spectral 
radius of Gt (e A ). Specifically, we can write the characteristic 
polynomial of Gt (e A ) as 



det ( 7 I - G T (e A )) = det (7I - (I - Ke A ) 1 Me ; 

det ( 7 I - 7 Ke A - Me A ) 



det (I - Ke A ) 

We note that the numerator is a quadratic equation in 7 and the 
denominator is a constant. It is therefore possible to find para- 
metric expressions for the two roots of det (7I — Gt (e A ))- 
Taking the maximum of the absolute values of these two roots 
yields an explicit, albeit convoluted, expression for the spectral 
radius of Gt (e A ). As such, A*(-) can be obtain efficiently. 

B. Empirical Mean Service 

We turn to the second type of aberrations we wish to study: 
deviations in the empirical mean service rate, 



1 



Di 



We note that {D s } is not a Markov process. However, D s 
is a (trivial) deterministic function of V s — (C^+^jv+i, D s ). 
Since {V s } is a Markov process, we can apply general results 
on the large deviation principle of additive functionals of 
Markov chains. To leverage these results, we first impose an 
ordering on the state space V = C x {0, 1}. Recall that |C| = k; 
a natural ordering for this state space is to associate integer 
v = (dk+i) with state (i, d). Using this ordering, the transition 
probability matrix II for the augmented process {V s } is given 
by 



where tt(vi,V2) is the probability of jumping to state v 2 , 
conditioned on starting from v\. 

Assumption 3: The matrix II is irreducible. 

This assumption is similar in spirit to Assumption |2] Yet 
the large deviation principle on the empirical service can be 
derived under weaker conditions. In particular, it suffices to 
show that K + M is irreducible, a requirement that is easily 
met. We stress that K + M is equal to B™, and the latter 
matrix is itself irreducible by Assumption [TJ 

Theorem 4 (l[22]l): Let {V s } be a finite-state Markov chain 
possessing an irreducible transition matrix II. For every x G 
K, define 

JO) =sup{Ax-logg(n A )} (14) 

AeR 

where IIa is a nonnegative matrix whose elements are 

tta (vi,v 2 ) = 7r(wi,w 2 )e Ad2 v 1 ,v 2 G {1, . . . ,2k}. 

Then, the empirical mean Z s satisfies the large deviation 
principle with the convex good rate function /(•). Explicitly, 
for any set T G M, and any initial state v G V, 

- inf I(x) < liminf - log P?(Z S G T) 

xer° s— >oo s 

< limsup - log P£(Z S G T) < - inf I(x) 

where P£ denotes the Markov probability measure induced 
by transition probability II and initial state v G V, i.e., 

s-l 

PyiVl = Vl, . . . ,V S = V s ) = TT(v,Vi) %(v t ,Vt+l). 

t=l 

Expressions for the transition probabilities used in this 
theorem appear in Q. We note that 

Pv(V s+1 =U,d 2 )\V s = (i,d 1 )) 

= Pr(V s +i = (j,d 2 ) I C (s+1)N+1 = i) ; 

this induces a repetitive structure in matrix II. The nonnegative 
matrix IIa associated with every A G K can then be written 
explicitly as 



Kik Hne 



Kkk 
Klk 



Vk\& 

^ n e A 



Mfefee 
Mifce A 



(15) 



_^fci • • • K-kk fJ-kie ■ ■ ■ fJ,kk£ 
We can rewrite 11^ by taking advantage of its block structure, 

"K Me A " 
A K Me A ' 



n 



n 



= ir(vi,v 2 ), Ui,u 2 € {!,..., 2fc} 



The pertinent eigenvalues are the roots of the characteristic 
polynomial of Tlx- Using properties of matrix determinant and 
the commutative properties of some of the blocks, we can 
express this polynomial as 

det (7I - n A ) = det (( 7 I - Me A )(7l - K) - MKe A ) 
= det (( 7 I - K)( 7 I - Me A ) - KMe A ) 
= det ( 7 2 I - 7K - 7Me A ) . 
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Collectively, Theorem [4] and the matrix defined in ([T5T l 
provide an algorithmic workflow for the computation of the 
good rate function associated with the empirical means {Z s }. 
We follow this discussion with an example based on a two- 
state channel with memory. 

Example 2: Once again, consider a Gilbert-Elliott erasure 
channel with C = {1,2}. An advantage in studying this 
rudimentary model is that it admits a simple, closed-form 
characterization. The dimension of the state space in this case 
is |V| =4. Using the commutative block structure discussed 
above, the determinant of (7I — II a) reduces to 



det (7I - n A ) = det (7 2 I 



7 2 det 



7' 



k 2 i ~ 



7K- 

A 



- Mn e 



7 



-K42 - 
- K 22 



By inspection, we see that the spectral radius of II>, is the 
largest root of the quadratic equation 

7 2 - 7(^11 + k 22 + (/in + M22)e A ) 

+ [k\\ + Aine A )(K 2 2 + P22e x ) 

- («12 + Mi2e A )(K2i + ^2ie A ) = 0. 

For fixed parameters, this dominating root can be computed 
using the celebrated quadratic formula. We will revisit this 
example in Section [VTl 

C. Relation between A*(-) and /(•) 

The two rate functions introduced above, A*( ) and /(•), 
characterize the large deviation principles for the mean trans- 
mission time and average service rate, respectively. Since the 
processes {T q } and {D s } are closely related, one can presume 
that their governing rate functions are somehow linked. A key 
insight in understanding this relation is to realize that the 
following events are equivalent: for any positive integers to 
and n, 



T m >n} = {Dt 



D n < to}. 



(16) 



In words, the first event occurs whenever more than n attempts 
are required to successfully deliver to packets, while the 
second event states that fewer than to packet transmissions 
have been successful within the first n attempts. Using this 
relationship and scaling arguments, one can establish our 
next proposition which substantiates the existence of a strong 
connection between the two rate functions. 

Proposition 3: If the rate functions A*(-) and /(■) are finite 
in the open intervals (1, 00) and (0, 1), respectively, then they 
satisfy 

I(x) = xA* f * 
for x e (0, 1). 

Proof: See Appendix. ■ 

V. Performance Evaluation 

Thus far, we have devoted much attention to developing a 
thorough understanding of H and, in particular, its generating 
function. In this section, we apply the results of Theorem [2] 



and we derive a number of pertinent performance criteria with 
practical significance. 

First, recall that [z t JG/f (z) = Pr(H = t). Accordingly, 
the probability that the queue fails to drain within r time units 
is equal to 

M 

Pitff >T) = l-5> 4 JG Ho (z). 
t=o 

Moreover, the average time required to empty the queue is 
obtained by differentiating the moment generating function of 
Hq and then taking the limit as z approaches one, 

E [Ho] = lim -^-Gjf„ (z). 
zfi az 

Alternatively, using Chernoff inequalities, it is possible 
to upper bound the probability of a deviation event in a 
computationally efficient manner, 

Pr(tf > t) < e~ Xr E [e XH °] = e Xt Gh ( e ^) 

for any A > 0. The optimal bound derived from this collection 
of inequalities is sometimes expressed in logarithmic form, 

log Pr(iJ > t) < - sup { At - log (G„ (e A )) } . 

A>0 

The large deviation principle on Ho derived in Section |IV] 
confirms that, under mild conditions, this latter bound is 
asymptotically tight. 

It may be instructive to stress that Hq, the first-passage 
time introduced in ©, is defined in terms of codeword 
transmission attempts. That is, Ho represents the cumulative 
number of codewords sent by the source until the queue 
empties out completely. Such a metric poses no issue when 
comparing systems of identical block lengths. However, when 
assessing the performance of candidate implementations with 
different block lengths, a more careful interpretation of the 
results becomes necessary. This subtlety arises because of the 
mismatch in indexing between the evolution of the queue and 
the number of channel uses. For a fair evaluation of potential 
candidates, hitting times should be scaled to portray their 
evolution according to a common clock, that of the channel 
process. 

Define random variable Hq by 

H = NH , 

where N designates the block length associated with the 
underlying implementation. Then, Ho denotes the number 
of channel uses necessary to empty out the queue, and it 
can therefore be employed to provide a uniform measure 
of performance. While it is straightforward to extend our 
performance criteria to Hq through the relation 



Pr(# > r) = Pr \H > — 

it is essential to apply this transformation when comparing 
systems with different block lengths. 

A similar scaling is needed when comparing the large 
deviations of systems with different parameters. A proper 
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scaling for the fair comparison of mean sojourn times can 
be expressed in terms of channel uses per information bit, 

This leads to the following asymptotic regime 
1 



parameter to be optimized. We model the wireless connection 
as a Gilbert-Elliott erasure channel. Again, we denote the 
transition probability matrix for this Markov process by 



lim — log Pr ( Yi > r 



K e^o \l/K\ g 1, \£/K] H ° N 

1 V 1 1 D ( 1 irM K 

— hm — logPr — HA ' > — t 

K m->co m y TO JV 



where r > E Yoo . Likewise, to account for discrepancies 
in design parameters, the empirical mean service can be 
expressed in terms of decoded bits per channel use, 

\n/N\ 

Z n = - V KD t . 
n ti 

The ensuing asymptotic regime becomes 
lim - logPr (z n < 77) 



— lim , 

N n^oo [ n /N 



1 , / 1 L ^ J n\ 
- — r ogPr - — ; — - > D t < —n\ 



— lim — log Pr 



N \K 1 



where rj < E Zoo ■ Collectively, these various modifications 
enables the comparison of competing implementations with 
different values for K and N. 

Another concern that comes into play when optimizing over 
block length is the impact of the initial state of the system. 
If the number of bits at the source is fixed at time zero, the 
scope of the optimal solution may be very narrow. This is a 
situation akin to over-fitting in statistical modeling. To provide 
a more robust characterization with widely applicable results 
and guidelines, it may be beneficial to assume that the number 
of bits in the queue at the onset of the transmission process 
is random, with a prescribed representative distribution. In 
our numerical study, we circumvent some of these difficulties 
by assuming that the block length is fixed and the initial 
queue length is random. The specifics of our investigation are 
detailed below. 

VI. Numerical Analysis 

In this section, we apply the methodology developed above 
to an illustrative example. Physical parameters are selected to 
resemble an implementation of the global system for mobile 
communications (GSM). Specifically, the block length is fixed 
at N — 114. The information content per codeword, K, is a 



B 



&21 



612 
622 



For simplicity, we assume that e\ = 1 and e 2 = 0. The 
probability of a bit erasure is set at twenty percent, which 
implies 

z \ =0.2. 

012 + 021 

For this elementary model, channel memory can be expressed 
unambiguously through the decay factor (1 — 612 — 621), which 
is determined by the spectrum of the matrix. A decay factor 
equal to zero is equivalent to a memoryless channel, while 
correlation increases as (1 — 612 — 621) approaches one. Except 
where specified otherwise, we employ a decay factor equal to 
0.9 in our numerical results. 

We assume that L, the number of information bits contained 
at the source at time zero, is a random variable possessing a 
Gamma distribution with mean 2000 and standard deviation 
100. Randomizing the number of bits at the source partly 
alleviates the idiosyncratic effects associated with partitioning 
the queue content into segments of K bits. For a source buffer 
with I information bits, the number of segments to be delivered 
is \t/K~\ and, as such, a one-bit variation in £ can result in 
having an additional message to send. Imposing a random 
distribution on the number of information bits at the source 
leads to a probability distribution on M — \L/K~\. This, in 
turn, yields smoother results. 

Figures [4] and [5] present the mean and variance of the first- 
passage times for the ARQ and hybrid ARQ schemes as 
functions of the number of information bits per codeword. 
Varying the code rate affects both the expected value of 
the first-passage time and its variance. A low code rate 
offers more protection against erasures and, accordingly, the 
resulting distribution of the hitting time to an empty queue 
is very narrow. Increasing the code rate initially reduces the 
mean first-passage time, as every successful decoding attempt 
reveals more information bits. However, a higher code rate also 
raises the probability of decoding failure. Eventually, as the 
code rate is pushed further, decoding failures start to hamper 
the draining process and the mean first-passage time grows 
due to excessive repetition requests. This effect is much more 
pronounced for standard ARQ. 

The penalty in using a high code rate is less severe for the 
hybrid ARQ scheme because the failure recovery mechanism, 
which is based on incremental redundancy, adapts gracefully to 
channel conditions in this latter case. For instance, when K is 
very close to N, decoding under standard ARQ will fail nearly 
every time. Contrastingly, the effective code rate drops rapidly 
with decoding failures under hybrid ARQ. The robust profile 
of hybrid ARQ is a key property that underlies the popularity 
of this paradigm in practical systems. In the current example, 
the upper and lower bounds derived for E[iJo] under the hybrid 
ARQ scheme are essentially indistinguishable, hinting at the 
fact that decoding failures are nearly inexistent once three 
blocks are received. 
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Information Bits (K) 

Fig. 4. This figure shows mean first-passage times as functions of K. The 
block length employed in all cases is N = 114. The underlying Gilbert-Elliott 
channel produces erasures with probability 0.20, and it possesses a dominant 
decay factor of (1 — b\2 — J>2i) = 0.9. The expected number of bits at the 
source at time zero is 2000. The upper and lower bounds for the hybrid ARQ 
scheme with a depth of d = 3 are indistinguishable. 
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Information Bits (K) 

Fig. 5. This figure displays variances of the first-passage times to an empty 
queue as functions of K. The parameters used in this numerical study are the 
same as those featured in Fig. [4] The variance for the hybrid ARQ scheme is 
calculated with the upper bound T. 



Perhaps not too surprisingly, our numerical investigation 
suggests that the optimal code rate is somewhat impervious to 
initial queue conditions. To examine the effects of the initial 
queue length, we employ the channel parameters described 
above and we modify the distribution on L. For Gamma 
distributions with means E[L] € {500, 1000,2000,3000} and 
standard deviation 100, the optimal value of K in terms of 
mean first-passage time is consistently equal to 73 for standard 
ARQ and it remains fixed at 81 for the hybrid variant. 

Using the methodology established thus far, it is possible to 
consider additional performance criteria. For instance, we can 
analyze the crossings of the cumulative distribution function, 

h p = mm{t | Pr(H <t)>p}. 
Fig. [6] plots the number of transmission attempts associated 



with threshold values p 6 {0.45, 0.95}. We observe that 
the optimal value of K decreases slightly when the crossing 
threshold p approaches one. In other words, when focusing 
on worst-case behavior, the system tends to favor a more 
conservative setting with extra protection against erasures. 
This phenomenon offers another perspective on the tradeoff 
between expected behavior and its variations. 
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Fig. 6. The crossings of the cumulative distribution function Fh (-) offer 
conservative figures of merit for the operation of the queued system. In this 
example, the lines correspond to thresholds p € {0.45, 0.95}. 



Next, we turn to the large deviations techniques developed 
in Section [IV] As a reference, we consider a voice stream 
application. In GSM, each speech frame of length 20 ms is 
encoded into a data segment of length 228. The underlying 
physical layer has the ability to transmit one symbol every 
40 fis. If we approximate the maximum delay tolerance for 
one-way voice traffic to be 40 ms [38, p. 70], then this requires 
228 bits to be transmitted within roughly 1000 channel uses. 
This constraint, in turn, necessitates a nominal rate on the 
order of 0.23 bits per channel use for link reliability. We use 
this figure as a rough estimate for the needs of a voice stream 
in our numerical study. 

The maximum throughput that can be supported over the 
Gilbert-Elliott channel with random codes of length N = 114 
is slightly above 0.5 bits per channel use. Recall that threshold 
r] represents a minimum target requirement on the number 
of information bits per channel use that can be successfully 
decoded at the destination, in an asymptotic regime. When 
rj < 0.5, there exist values of K for which the rate function 
jjl (j^v) i s strictly positive; this can be seen in Fig. [7] 
These curves can be used to characterize the tension between 
quantization and failures to deliver media properly. A high 
quality stream, with a large rj, will offer an enhanced viewer 
experience when transmitted adequately, but will necessarily 
be more prone to interruptions and failures, as exposed through 
the rate functions. A low-bandwidth, low-quality stream on 
the other hand offers a better delivery profile with a smaller 
probability of failure. However, the quality of the playback 
may not be satisfactory to the end user. A proper selection 
of parameters for an adequate overall user experience can be 
made through the rate functions of Fig. [7] 
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Fig. 7. This figure plots good rate functions governing large deviations in 
the empirical mean service as functions of K, the number of information bits 
per codeword. Given throughput threshold 7], the optimal value of K is the 
argument corresponding to the apex of the function. 



Fig. 8. This figure shows good rate functions governing large deviations in 
the mean sojourn time as functions of K. The optimum code rate depends 
heavily on the deviation threshold of the mean sojourn time. 



Once rj is picked, the corresponding curve displays perfor- 
mance as a function of K. For low code rates, the maximum 
achievable throughput is less than the service requirement and 
hence the rate function governing large deviations is zero. 
At high code rates, performance is limited by the rise in the 
probability of decoding failure. The system must then find the 
right balance between the frequency of failures and the payoff 
of a decoding success in terms of information bits. The optimal 
value of K for a specific threshold r\ is given by the apex of 
its curve, 

K* z (ri) = arg^max-^7 (w^) ' 

It is interesting to note how conservative the optimal code rate 
becomes when the target service requirement is reduced. 

The second type of rate functions introduced in Section [IV] 
characterizes large deviations in the mean sojourn times, as 
shown in Fig. [8] These curves can be employed to tradeoff 
playback quality and buffering times for streaming media. 
More specifically, t represents a limitation on the average 
number of channel uses employed to transmit one bit of 
information. Of course, when a high-quality rendering is 
selected, the system must deliver a larger amount of data 
within the buffering window and, hence, the probability of 
delay violation becomes greater. The behavior of the system 
in terms of average sojourn time is closely related to the 
empirical mean service, holding a reciprocal relation. In this 
case, the optimal value of K becomes 

k y( t ) = arg^Ttax-^A* f^rj . 

We emphasize that the optimal code rates are equal, namely 
K* z (rf) — Kyir) whenever r = r/ _1 , This is due to the 
relation between /(•) and A*( ) described in Section HV-CI 

The last aspect of this system we wish to explore is the 
potential impact of channel memory and correlation among 
successive channel uses. As before, we keep the probability 



of a bit erasure at twenty percent. However, we vary the 
decay factor of the channel, (1 — 612 — 621)1 from zero to 
one. Once again, we assess performance using the mean 
first-passage time to an empty queue. When the channel is 
memoryless, the optimal value for K is 81. As correlation 
increases, more protection against erasures is beneficial and 
the optimal value of K decreases moderately. This enables the 
system to compensate for short sequences of erasures. Still, as 
correlation strengthens, it becomes difficult to correct longer 
strings of erasures. When this happens, the penalty of a smaller 
payoff produced by a low rate code begins to dominate. In 
other words, attempting to recover every packet starts to be 
ineffective. Rather, the code rate must be selected to transmit 
more information bits when the channel is favorable. As 
(1 — 612— 621) approaches one, the optimal value of K/N tends 
to one as well. In the limit, the channel behaves much like a 
packet erasure model: send as many bits as possible when 
the channel is good and ask for retransmissions whenever the 
message is corrupted. The data points that provide a basis for 
these findings are summarized in Table J] 



TABLE I 

Optimal number of information bits per codeword as a 
function of channel memory factor 1 — 6 12 — f>21- 



Channel 


Optimal Value 


Mean First-Passage 


Crossing 


Memory 


of K 


Time E[H ] 


feo.96 




ARQ 


HARQ 


ARQ 


HARQ 


ARQ 


HARQ 





81 


81 


26.92 


25.90 


30 


30 


0.5 


77 


78 


28.87 


27.79 


32 


32 


0.9 


73 


81 


34.65 


32.03 


40 


38 


0.95 


77 


96 


36.68 


31.75 


44 


38 


0.98 


95 


107 


35.21 


28.62 


45 


36 



VII. Conclusions 

This article presents a methodology for the analysis and 
the design of digital communication systems that operate over 
channels with memory. The proposed approach is based on the 
time elapsed between the onset of the communication process 
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and its termination. Results also extend to the asymptotic 
decay rates of mean service and mean sojourn time. Emphasis 
is on the selection of code rate for protection against erasures. 
We provide a simple mathematical characterization of the first- 
passage time to an empty queue and the large deviations on 
the mean service and mean transmission time, along with a 
computationally efficient means to compare the performance 
of various implementation candidates. 

The properties of coded systems are explored through a 
numerical study. Optimal code rates appear robust to initial 
buffer conditions at the transmitter. That is, the number of 
information bits to be sent from the source to the destination 
does not significantly affect the optimal operating point of the 
encoder. Optimal operation is achieved with very similar K 
values for mean first-passage times and various crossings of 
the cumulative distribution function. 

For both mean service rate and mean sojourn time, it seems 
that the optimal operating point of a system in terms of code 
rate selection depends heavily on the needs of the underlying 
traffic. In particular, delay-adverse applications may perform 
better with coarse quantization and low-rate codes. On the 
other hand, delay tolerant applications may be able to use a 
higher rate on the same physical channel. This phenomenon 
is closely related to the concept of effective capacity. 

Lastly, the optimal code rate depends heavily on channel 
memory. This suggest that, for systems with fixed block 
lengths, the channel parameters should be estimated and fed 
back to the encoder for optimal operation. This naturally 
leads to adaptive strategies and possibly state-aware encoding 
schemes at the source. 

Appendix 

The appendix contains proofs for results that appear in the 
main body of this article. Although deemed important, these 
demonstrations are presented below as to not disrupt the flow 
of the paper. Each section heading points to the statement 
being considered. 

A. Proof of Theorem Q] 

We begin this proof by introducing a convenient notation 
for abstract sequences. Let {a s } be a discrete-time sequence 
and assume that r and t are two integers with r < t. We use 
a* to denote the subsequence a r , a r +i, ■ ■ ■ ,at- 

Also, let Ut — (it) G C x No for every t > 0. Since 
{U s } s >o is a discrete-time stochastic process whose elements 
take on values in a finite set, it suffices to show that 



Pr (U s 



\U%=u s )=Pr(U s 



\Us 



in order to prove that this process is Markov. In general, the 
probability on the left hand side can be expressed as 

Pr (C {s+1)N+1 = i s+1 | C/ S = u s ) x 

Pr (Qs+i = q s +i | Uq = Uq, C( s+ i)at +1 = i s +i) ■ 

We know that the state of the channel at the onset of codeword 
s + 1, labeled C( s+1 )jv+i> is conditionally independent of 



the subsequence and the channel states c\ s 1 - )JV+1 i given 
C S N+i- Thus, we get 

Pr (C( s+1 )jv+i = i s +i | Uq = Uq) 

= Pr (Co+i^+i = is+i | C S N+i = i s ) ■ 

The length of the queue Q s +i at time s + 1 is either Q s 
or Q s — 1, depending on whether a codeword is successfully 
decoded at time s. For a non-empty queue, this depends solely 
on the generated codebook and the channel realizations during 
the transmission cycle of the codeword s. As such, we can 
write 

Pr (Q s +i = q s +i | Uq = -Uqj C(s+i)at+i = i s +i) 

= Pr (Qs+i = q s +i | U s = u s , C( s+ i)jv + i = i s +i) ■ 

Collecting these two results, we conclude that {U s } possesses 
the Markov property. 

B. Proof of Proposition Q] 

Notice that the proposition is trivially true when e > p. The 
only case of interest then corresponds to e < p. We observe 
that, through a change in indexing, we can write 

rc+e — 1 e— 1 

Y[ (l-2 l - p - n )= Yl (l-2'~ p ). 



/— — n 



As such, we readily see that P{(p + n, e + n) is monotonically 
increasing in n. The difference between this function and 
P{(p, e) is obtained as follows, 

Pf(p + n, e + n) — Pf(p, e) 

e— 1 n+e— 1 

= jj (i - 2 i - p ) n {i 2 i - n - p ) 

1=0 1=0 

n+e— 1 n+e— 1 

= Yl (! - 2<- n - p ) Yi (l 2<- n - p ) 

l=n 1=0 
n+e— 1 / n — 1 \ 

= Yl (! - 2 l - n - p ) 1 - Y[ (1 - 2 l -"- p ) 

l=n V 1=0 / 



(1) 



< i - Y[ (i - 2<- n - p ) <J2 2 

1=0 1=0 

n— 1 oo 



l—n- 



1=0 



1=0 



Step (1) follows from an rt-variable version of the inequality 
1 - (1 -f>2) < Pi +P2 where < pi,P2 < 1- This 

concludes the demonstration. 

C. Proof of Lemma [2] 

When A < — logg(K), the spectral radius of the matrix 
Ke A is strictly less than one and, consequently, the matrix 
I — Ke A is invertible. The finiteness of Gj-(e A ) immediately 
follows. We then turn to the alternate case, which we prove 
by contradiction. 

Assume that, for some A > — logg(K), matrix Gx(e x ) 
exists over the non-negative real numbers. Note that this 
condition implies g(K) > 0. For convenience, we wish to 
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work with the irreducible normal form of K 
there exists a permutation matrix P such that 



. That is, 



K = P X KP 



*1 *12 
* 2 











in which each *f?i is either irreducible or a zero matrix. Of 
course, this reordering also affects M, 

M = P T MP. 

However, this transformation does not alter the spectrum of 
K or M. We note that all the states corresponding to an 
irreducible VP, belong to a same communicating class, which 
we denote by Cj. Looking at the block triangular structure 
of K, we gather that the eigenvalues of K correspond to the 
union of the eigenvalues of . . . , *&h- Thus, there exists an 
integer j such that g(*&j) — £?(K). 

Since matrix VP j is non-negative and irreducible, the Perron- 
Frobenius theorem applies and there exists an eigenvector v, 
with positive components, such that 

v*j = e(*i)v = g(K)v. 

Without loss of generality, we can assume that v is normalized 
to one. Let w be a probability distribution with weight v over 
the states associated with and zero elsewhere, i.e., 



















Because v is an eigenvector of we have 

t 



w I Ke" | = 



••• (g(K)e x ) v * 



and, correspondingly, 

oo oo 

w^KV A = ^wKV 

t=0 t=0 

= [o ••• E^o(^K) e A )*v * ••• 

We note that the multiplicative factor X^o^^-)^)' 
a divergent sum that increases to infinity. In fact, all the 

~ r t„tX 



is 



D. Proof of Corollary [7] 

As a straightforward application of Lemma [2] we can show 
that g(K) < 1. By design, we know that T is finite almost 
surely. Then, from the definition of the matrix generating 
function Gt(z) in ©, we gather that 

[Gr(l)] tf = E [l {fW=i} \Ci=i] 

= Pr (C NT +i = J | Cx=i). 

That is, Gt(1) is a right stochastic matrix. 

Since K is a substochastic matrix, we already have the 
relation g(K) < 1. We wish to show that, in the current 
framework, this inequality is strict. Suppose that g(K) = 1. 
Lemma [2] states that, if A = — log g(K) = 0, then not all 
entries of Gj^e ) = Gy(l) can be finite. In particular, Gt(1) 
cannot be a right stochastic matrix. This leads to an obvious 
contradiction, which indicates that g(K) < 1, as desired. 

E. Proof of Proposition |2] 

For the first of this proof, we assume that A < — log g(K). 
The spectral radius of Ke A is then strictly less than one and, 
as such, (I — Ke A ) is invertible. This implies that the matrix 



Gr(e A )= £K 



V A 



Me' 



I- 



Ke A )" 



Me 



is well-defined over the real numbers. Under Assumption|2] we 
know that Gy(l) is an irreducible matrix. This readily implies 
that Gr(e A ) is also irreducible. We can therefore apply the 
Perron-Frobenius theorem ll22l Th. 3.1.1], whose asymptotic 
properties lead directly to A(A). 

For the second case, we suppose that A > — logg(K). By 
Lemma [2] we know that at least one entry of Gr(e A ) is equal 
to infinity. We can use the irreducibility of this matrix to argue 
that each row in (GT(e A )) fc has at least one entry that is 
infinite. Since ttq is a probability distribution, 



E 



,X(T 1 +-+T k ) 



(G T (e A )) l = oo. 



For any m > k, we have 



A m (mA) =logE[e mAy "*] =logE 



,\(Ti + -+T m ) 



> logE e 



,\{T 1 +-+T k ) 



components of wE ( =o K ' e corresponding to states that Consequently, whenever A > -logg(K), we get 
are accessible from Cj must also diverge ' ' ' ' 
assumption the elements of 



Since by 



A(A) 



lim — A m (mA) = oo, 

in— too 777, 



G T (e A ) = ^K*e* A 



Me 



as desired. 



remain finite, we conclude that any state accessible from Cj 
must lie in the nullspace of M. This necessarily means that 
wGt (e A ) = and, consequently, wGj-(l) = because 
K and M are non-negative matrices. In other words, we 
have created a valid probability distribution w for which 
wGy(l) = 0. Equivalently, in the original domain, we can 
rewrite this equation as wP T Gr(l) = 0. But this equation 
violates our assumption that T is finite almost surely. We then 
conclude, by contradiction, that not all entries of Gr(e x ) are 
finite when A > — log g(K). 



F. Proof of Proposition \3\ 

For the sake of completeness, we offer a brief proof for 
Proposition [3] As an initial step for this demonstration, we 
establish a few key properties. The processes {Y m } and {Z s } 
converge almost surely, i.e., 



Y„ 



5> 

9=1 



1 



T 



D, 
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where T are D are constants. Moreover, T and D have a 
reciprocal relation, i.e., T = 1/5. 

Recall that process {V s = (C( s +i)j\r+i> D s j } is a finite- 
state Markov chain with irreducible transition probability 
matrix II. Also, D s = f(V s ) is a (trivial) bounded function. 
Then, by the ergodic theorem for Markov chains 1 20 1 , we have 



Pr [ lim — 



D t = D\=l. 



Let Hi be the subset of fl defined by 

fix = 

Clearly, for any uj G fii, we necessarily have 

s 

N(s,u>) =^A(w) -> oo. 
t=i 

Consider the empirical average defined by 

TO ^ 

9=1 

We wish to show that this sequence converges almost surely 
to 1/D as to increases to infinity. For any us G fii, we have 

N(s,u) N(s,u>) + 1 

E W< S < £ 7». 

g=l 9 =1 

As such, we get the inequality 
1 



(17) 



N(s,ui) 

E 

9=1 



W < 



In a similar fashion, we obtain 
1 



N(a,u) 



1 
D' 



N(s,lu) + 1 ^ 



T,(w) > 



9=1 



N(s,u) 



s 



N(s,u>) + 1 N(s,ui) 
It follows that, for any ui G fii, we get 



1 

D' 



1 



N(s,u>) 



1 



v 7 q— 1 



(18) 



To complete the proof, we must connect this result to our 
original sequence ( fTTI i. We emphasize that, for any w e Oi and 
for any m G N, there exists s such that N(s,lu) = rn because 
N(s) increases by at most one at every step. It follows that 
(fTTT i is a subsequence of convergent sequence (fT8l . They must 
then share the same limit. Collecting these results, we gather 
that 



Pr 



lim -VT„4 =1. 



9=1 



D 



As a side note, it is possible to show that 
D = E ffD [D t ] = n D Ml 

f = E WT [T q ] = Tl T 



where ^G T (e A ) denotes the entrywise derivative. Above, ir D 
and ttt represent the invariant distributions of the channel and 
the stochastic matrix Gy(l), respectively. 

Our strategy to finish this proof is to establish the claimed 
result for rational numbers, and then invoke continuity to get a 
full characterization. From our hypotheses, we know that the 
rate functions A*(-) and /(■) are finite in the open intervals 
(1, oo) and (0, 1), respectively. We note that these functions 
are also convex over these intervals and, hence, continuous. 
Let r — p/q, where p, q G N, be a rational number less than 
one. Recall that /(•) is convex and, hence, continuous over 
(0, 1). Then, for every e > 0, there exists 5 > such that 

- 7(r) - e < liminf — logPr (Z np G (r - S,r + 5)) 

n->oo np 

< lim sup — logPr (Z np G (r — S,r + 5)) < —I(r) + e. 

n— ¥oo Tip 

Taking the limit as S — > 0, we get 

lim lim inf — log Pr (Z np G (r — 6, r + 6)) 

<5— s-0 n— yoo np 

= lim limsup — logPr (Z np G (r — S. r + 5)) = —I(r). 

<S— s-0 n ^oo np 

A similar argument applies to {7 m }. Noting that q/p G 
(1, oo), we gather that A*(-) is continuous in a neighborhood 
of 1/r. Then, for every e > 0, there exists S > such that 



-A* 



< liminf — logPr \Y ng G \--S,-+6 

n->oo nq \ \r r 



< lim sup — log Pr ( Y nq G 

n— >oo Tiq 

< -A* 



l --8, 1 - + 5 

r r 



As before, this implies that 

lim lim inf — log Pr [ Y m G ( S, — \- 5 

5-s-o «->oo nq \ \r r 

= lim lim sup — log Pr ( Y nq G ( 5, — ^5 



5— j-0 

= -A* ( - 



nq 



We stress that the rate functions A*(-) and /(•) vanish at T 
and D, respectively. 

At this point, we need to consider two separate cases. First, 
suppose r < D. We know that /(•) is a non-increasing function 
over interval [0, L>) (see, e.g., E2l Lemma 2.2.5]). Also, in an 
analogous manner, rate function A*(-) is non-decreasing over 
(f, oo) . Leveraging ( TToT l. we can write 



T, 



Pr 

pn p / 
By letting n go to infinity, we obtain 



D. 



qn_ P 



qn 



inf rA*(x) = inf I(x). 
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Using the monotonic properties of these rate functions over 
the prescribed intervals, we get 

rA*(-)= inf rA*(x) = inf I(x) = I(r), 
\rj xe[i,oo) xe(o,r] 

as desired. 

For the second case, assume r > D. Under this constraint, 
the monotonic properties of the rate functions are reversed. 
That is, /(•) is non-decreasing over [D, l) and A*(-) is 
non-increasing over (0, T). Using these relations and the set 
equalities 



Pr 



T, 



pn 

we can write 



qn q 



rA* 



inf rA*(x) = inf I(x) = I(r). 

ce(0,i] xe[r,oo) 



Collecting these results, we deduce that I(x) = xA* (-) 
whenever x E Q fl (0, 1). Since the rational numbers are 
dense in (0,1) and the two rate functions are continuous, this 
equality must also hold for any real number in (0, 1). 
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